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CONSTRUCTING CARMICHAEL NUMBERS THROUGH 
IMPROVED SUBSET-PRODUCT ALGORITHMS 

W.R. ALFORD, JON GRANTHAM, STEVEN HAYMAN, AND ANDREW SHALLUE 



^^ Abstract. We have constructed a Carmichael number with 10,333,229,505 

Cn prime factors, and have also constructed Carmichael numbers with k prime 

» . factors for every k between 3 and 19,565,220. These computations are the 

C^ product of implementations of two new algorithms for the subset product prob- 
lem that exploit the non-uniform distribution of primes p with the property 

^— H that p — 1 divides a highly composite A. 

ON 
(N 



1. Introduction 

A Carmichael number n is a composite integer that is a base-a Fermat pseudo- 
prime for all a with gcd(a,n) = 1. However, constructions of Carmichael numbers 
often rely on the following equivalent definition. 

Definition 1.1 (Korselt condition). A positive integer n is a Carmichael number 
if it is composite, squarefree, and has the property that p — 1 | n — 1 for all primes 
p dividing n. 



^- Our goal is to construct Carmichael numbers with a very large number of prime 
factors. The construction we will use is due to Erdos [1] and has been a popular 

^Q method since 1992 EH1- 

^-\ Erdos Construction: 

£Q (1) Choose A = nl=i 1i i where q\ . . .q r are the first r primes in order and the 

^j hi are all at least 1 and non-increasing. 

,-H (2) Construct the set V = {p prime : p — 1 | A,p \ A} 

^. (3) Construct Carmichael n as a product of primes in V in one of two ways: 

• »~j (a) Find a subset S of V such that 

5_i MjjeI mod A . 

^ pes 



Then by Definition 1.1 n — Y\ veS p is Carmichael. 

(b) Alternatively, let b = Jlpe-p V mod A and find a subset T of V such 
that 

I I p = b mod A . 
peT 
Then n = Y[r>ev\TP ^ s Carmichael. 
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2 ALFORD, GRANTHAM, HAYMAN, AND SHALLUE 

This notation for A will be fixed throughout, along with b as the product modulo 
A of all primes in V . Additionally, we will use Qi to represent q i i and fix the 
ordering so that q\ = 2, qi — 3 and so on. 

Choosing a good A is something of an art, seeing as how we want a number that 
is small and yet has many divisors. One possibility (taken as a starting point by 
the authors in [9) is to choose A to be highly composite [13] . We do not insist 
upon it here, instead relying on the condition that hi < r/i for 1 < i < r in order 
to prove bounds on running times. In practice, an excellent tool for choosing A is 
the following function K(A) from [5] that returns an estimate of [P\. 



K(A) = 



A J[(h.' qi - 2 



0(A) log V^^V <?i-l 



In terms of constructing V, all divisors d of A are checked to see if n = d + 1 
is prime. Primality proofs are easy since we are given the factorization of n — 1. 
They are also necessary; the second author has found pseudoprimes while testing 
primality using randomized algorithms. 

Loh and Niebuhr noted that step (2) is by far the most costly. Nevertheless, the 
improvements we present will be to step (3) , seeing as how the subset product prob- 
lem in such a dense set of instances is fertile ground for algorithmic advancement. 
In addition, step (2) is easily parallelized while step (3) is not, and step (2) requires 
almost no memory while the space requirement of most subset product algorithms 
is high. 

Our new contribution involves improvements to existing literature on a broad 
array of fronts. One new algorithm combines a series of smaller Carmichael numbers 
into larger ones, enabling quick construction of Carmichaels with a variety of factor 
counts. Another new algorithm incorporates ideas from [9] and [5] to achieve a 
randomized method that solves the subset product problem using subexponential 
time and space (in fact, sublincar in N). Computationally this has resulted in 
Carmichael numbers with 10333229505 prime factors and with k prime factors for 
every k between 3 and 19565220. 

One key insight is that elements of V are not distributed uniformly in (Z/AZ) X , 
and this non-uniformity can be exploited. Another is that it is useful to make 
products that get successfully closer to the identity in (Z/AZ) X . We will use two 
such functions for our algorithms, the first of which comes from [3]. 

Definition 1.2. Let a E (Z/AZ) X . Then oj(a) is an integer between and r defined 

by 

w(a) = max i 

a mod Qi^l 

unless a mod Qi — I for all 1 < i < r, in which case w(a) = 0. 

Definition 1.3. Let a € (Z/AZ) X . Then w(a) is an integer between and r defined 

by 

w(a) = min i 

a mod Qi^l 

unless a mod Qi — I for all 1 < i < r, in which case cu(a) = r + 1. 

Formally, the subset product problem over an abelian group is defined as follows. 

Definition 1.4. Let G be an abelian group written multiplicatively with (a\, . . . , ajv, b) 
a list of elements of G. Then the subset product problem (ai, . . . , ajv, b) is to de- 
termine a sublist of the a^ that product to b in G. 
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In the Erdos construction G is typically (Z/AZ) X , but we will also consider 
subset product problems on subgroups of (Z/AZ) X . 

The subset product problem is NP-hard, but the difficulty of a particular instance 
can vary depending on its density. The hardest problems are those of density 1. 

Definition 1.5. The density of a subset product problem (aj., . . . , ojv, b) is 

N 



log 2 |G| ' 

Problems arising from the Erdos construction will typically have density much 
larger than 1, in fact closer to 0(N/\ogN). We thus expect algorithms to exist 
with running times much faster than 0(2 ). Since so many solutions are available, 
we are free to focus on a solution with special properties that makes it easier to 
hnd. 

Many algorithms for subset sum and subset product are randomized, and hence 
rely on an assumption about the distribution of the etj. A little experimentation 
reveals that elements of V are not uniformly distributed in (Z/AZ) X , but instead 
are close to a symmetric distribution (see Section pi). 

Definition 1.6. A random variable X on a group G follows a symmetric distribu- 
tion if for every a £ G, Pr[AT = a] = Pr[A~ = a -1 ]. 

In what follows we give a new algorithm for the random subset product problem 
that works for instances of high density on groups of the form given in the above 
construction. Specifically we prove the following two theorems. 

Theorem 1.7. Let G = Go be an abelian group with subgroups G±,G2, ■ ■ ■ ,G( 

where \Gi\/\G i+1 \ = 2 e for < i < £ - 1 and i = 0og|G|. Assume that 
ai,...,ajv are independent and distributed symmetrically in G, and that N > 

0(4V lo 8l G llog|G|). 

Then there is an algorithm that solves the subset product problem (oi, . . . , ajv, b) 
with high probability and requires time and space 

<5(V log i G i 



Theorem 1.8. Let G = (Z/AZ) X and V be defined as in the Erdos construction, 
with the added assumption that 1 < hi < r/i for all 1 < i < r. Assume that 
the elements of V are independent and distributed symmetrically in G, and that 
N = \V\ = K(K). Finally, assume that the probability p = 1 mod Qi for p G V is 
at least l/(/ij + 1) and independent across Qi. 

Then there is an algorithm that with high probability finds a subset of V that 
products to b in G and requires time and space 

2 0( v /(logAf)(loglogAf)2) 

The symbol log will denote the base 2 logarithm, while In denotes the natural 
logarithm. Groups are assumed to have efficient implementations of arithmetic. 

Thanks to Eric Bach and Carl Pomerance for helpful suggestions. The third 
and fourth authors are grateful to Mark Liffiton for helpful advice and support 
regarding both hardware and software. 
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2. Previous Results 

Constructing Carmichael numbers has a long history, one that is ably docu- 
mented in [5] and |12) . We restrict ourselves to pointing out the more recent results 
that provide context for our new computations. 

The largest tabulation of Carmichael numbers is due to Richard Pinch; his tab- 
ulation up to 10 15 jT2] has since been extended to 10 16 . Afford, Granville, and 
Pomerance proved there are infinitely many Carmichael numbers in pQ. The au- 
thors credit their inspiration to |f8j , who first used the Erdos heuristic to construct 
Carmichael numbers with a large number of prime factors. Current records for 
Carmichael numbers with many prime factors go to Loh and Niebuhr [9] at 1101518 
prime factors and an unpublished computation by the first two authors at 19565300 
prime factors. 

The method of [3] clearly works well in practice, but unfortunately is without a 
running time analysis. This makes it difficult to fit into the existing subset product 
literature since it is not clear how the running time depends on N or on A. The 
algorithm exploits the fact that among elements of V, residues of 1 modulo q i i 
are more common than other residues. It divides b by p £ V in such a way that 
the running product has one more residue equal to one at each step, backtracking 
if necessary. Measuring the closeness of an element of (Z/AZ) X to the identity is 
done via Definition 11.21 

There is a large body of literature on the subset sum problem that transfers 
immediately to the subset product problem. For subset product problems of high 
density the standard technique is dynamic programming, which in this case would 
take O(NA) time and space. Since our goal is to construct Carmichael numbers 
where N is in excess of 2 30 and A is even bigger, this method is infeasible. A better 
naive algorithm is to pick a random subset and see if it products to b mod A. Even 
if the elements of V were distributed uniformly the expected time taken would be 
0((j>(A)). The polynomial time algorithm of [3] is similarly infeasible; with N so 
large we need an algorithm that is sub-linear in N. 

Wagner's algorithm for the A;-tree problem [T7] has inspired an algorithm for 
the subset sum problem that gets faster as the density increases [10l [14j [11] . A 
recent paper by Howgrave- Graham and Joux [7] even gives improvements for most 
problems of density 1. However, all these methods are exponential time and thus 
inappropriate for our setting. 

Theorem |1.7| was inspired by the Kuperberg sieve from the theory of quantum 
algorithms [8J, which has complexity 2°^v lo sl G l) where elements being matched are 
in a group G. The same idea of combining elements in pairs to zero out square root 
of the bit size at each step was presented by Flaxman and Przydatek [5], though 
they chose to only apply their algorithmic idea to a narrow slice of problems with the 
proper density to make the algorithm run in polynomial time. A better application 
to the current setting would be to pick at least 2^ logA elements of V and pair them 
up over -^/log A levels, zeroing out %/log A bits of A at each level. The algorithm 



in Theorem 1.8 does even better by applying the Kuperberg idea to a subgroup of 
(Z/AZ) X , and the surprise is that there are enough elements of V that fall in the 
subgroup so that the algorithm can still succeed. 
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3. Algorithms 



Among the two new subset product algorithms presented in this section, Algo- 
rithm nl (developed by the first two authors) is more appropriate for constructing 
Carmichael numbers with k prime factors for a variety of k, while Algorithm [2] 
(developed by the third and fourth authors) is better at constructing Carmichael 
numbers with a very large number of prime factors. Both build products of primes 
in V that get successively close to the identity in (Z/AZ) X , with Algorithm [l] uti- 
lizing Definition |1 .3| while Algorithm [2] uses Definition |1.2| 

The motivation behind Algorithm 1 was the observation by the first author that 
his implementation of the Erdos heuristic only needed to use a small fraction of the 
available primes to generate Carmichael numbers. Through repeated runs, each 
time removing the primes comprising the previous Carmichael number, it produces 
a set of co-prime Carmichael numbers where the product of any subset also forms 
a Carmichael number. Because of the inevitable difference in numbers of prime 
factors, it is likely that the sums of the individual numbers of prime factors will 
cover a wide range of possibilities. 

In order to maximize the chance of getting most of the intermediate numbers 
of prime factors, we want to generate as many different Carmichael numbers as 
possible with relatively few prime factors. We achieve this goal over r stages (one for 
each Qi), where at stage j we work with a set Sj containing products a with a)(a) = j 
(we call this set S for simplicity). Let 5i = V, and let bj = Ilaes a m °d A. For 
each element in Sj , calculate its residue modulo each of the prime powers dividing 
A. Do the same for each bj. 

First, if bj ^ 1 mod Qj find and remove the element e £ S that maximizes 
ui(e bj). As long as there is an element congruent to the product modulo Qj in 
stage j (which there will almost certainly be in high density situations), the new 
product of all primes in S will be 1 modulo Qi for all i < j. Then for each remaining 
element of a £ S, we use a greedy algorithm to find a G S maximizing uj(a ■ a). 
For simplicity Algorithm fl] shows ui(a ■ a) increasing by only one, but an important 
improvement is to efficiently find a that maximizes the increase to the Co value (see 
Section|8]for details). At the end of this process, you will have a (potentially empty) 
set of elements that were not matched. At this point, you can multiply all of these 
"chaff" together to get an element that is 1 modulo Qj , as well as being 1 modulo 
Qi for i < j. Replace S with the set- aside products, along with the product of the 
chaff, and move on to the next stage. 

At the end of the r stages, you have a collection of coprime "base" Carmichael 
numbers. It is not necessary to compute Carmichael numbers with a particu- 
lar number of prime factors to prove its existence. Instead, let v be the vector 
[1, 0, 0, . . . , 0] of length equal to one greater than the number of total prime factors 
in the Carmichael numbers. Loop over the base Carmichael numbers. For each 
number, let v' be v shifted to the right by the number of factors in that Carmichael 
number. E.g., if the first base Carmichael has 3 factors, v' = [0,0,0,1,...]. Let 
v = v + v' . Then, at the end of the loop, the fcth position in the vector will represent 
the number of constructed Carmicheals with k — 1 prime factors. (Excepting the 
first position.) 



Our next algorithm constructs a Carmichael number via Step 3b of the Erdos 
construction. As in Algorithm [I] there will be several running products; the goal is 
to get these products closer to the identity in (Z/AZ) x . However, this time elements 
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Algorithm Is Many Carmichaels subset-product 



i Si «- V ; 

2 for u <— 1 to r do 

3 sort S u ; 

4 calculate 6 U = Jlaes a m °d Qui remove a from S u that satisfies 
a = b u mod Q u ; 

5 for a e 5 U do 

6 Find a £ S u such that a • a = 1 mod Q u ; /* pushing down */ 

7 S'm+i «- ad ; 

8 Remove a, a from S ; 



product remaining a £ S u , add to 5 u +i ; 
10 return S r +i, each element of which is Carmichael ; 



are matched for Qi with i close to r first, and at each level the first element matched 
is the distinguished product containing b^ 1 modulo A. In this way the final identity 
product is 6 _1 times a number of primes from a subset T of V , making V \ T the 
factors of a Carmichael number. 

As discussed above, a heuristic application of the ideas from [S] results in a 
subset product algorithm that takes time and space 0(2 v/logA ). However, this is 
still inefficient since it does not take advantage of the large number of p £ V with 
uj(p) small. 

Let Nj be the size of the set Vj = {p £ V : w(p) < j}. Then define a subgroup 
G of (Z/AZ) X as (Z/AZ) X with A = J]™ i Qi, where 

m = min j such that E[Nj] > (log A)4"^=i ''* log? * . 

l<j<r 

We will see in Section [7] that with reasonable assumptions the expected size of Nj 
is at least N ■ Ui=j+i Vi h i + l )- 

For shorthand let i = Wlog\G\. It is important that during the construction 

of V we pick out all elements of V m and 9(2 v log l G l log \G\) elements of V with w 
values equal to j for m < j < r. The elements with u value greater than m will 
be needed to match with & _1 , so that a product of primes and b~ l has ui value 
m. This product will be called the distinguished clement a^. The elements of V m , 
along with ao, are then matched over the course of £ levels. At each level, products 
have another £ bits eliminated, so that by the end of £ levels an identity product 
has been formed. 

Pseudocode is presented as Algorithm [2] 

Constructing subgroups of the correct size is not too hard, since \G\ will typically 
be products of small primes to large powers. As an example, note that if G = 
(Z/2 /ll Z) x then G' = {a £ G : a = 1 mod 2 e } is a subgroup of G of order 
2^1 -e anc L nence index 2 e_1 . Thus for this example of G we have Go = G and 
G l = {a=lmod2 4 Lv / IG|J}. 
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Algorithm 2: Large Carmichael subset-product 



/* Phase 1: find/construct elements of G */ 



1 Input: set S containing V m and (log |G|)2v lo sl G l primes p with w(p) = j for 
each of m < j < r ; 

2 %f- b^ 1 mod A ; /* b is product of all elements of V */ 

3 T = ; 

4 for u <- r to m do 

5 find p G Tu such that a Q ■ p = 1 mod Q„ ; 

6 ao <— a • p mod A, T <— T U {p} ; 

7 add ao to S 1 ; 

/* Phase 2: continually pair products to reach identity in G */ 

8 construct subgroups Gi, 1 < i < £ with factor groups having size 2 ; 

9 for i <— 1 to £ do 



10 

n 



pair the element containing ao first to ensure it is included 
pair elements of S whose product is in Gi ; 



12 return the history of any element in Ge = {1q} 



4. Symmetric Distributions 

Algorithms [T] and [2] are not guaranteed to succeed. Rather, they will succeed 
with some positive probability depending upon the distribution of the elements 
of V . The key result proven in this section is that if the elements of V are dis- 
tributed symmetrically, then the probability that the product of two elements is in 
a subgroup is at least as large as if the elements were distributed uniformly. 

First, however, we mention the tail bound that will be used frequently in the 
analysis that follows. Since products at any level are composed of distinct ele- 
ments of V, if the initial elements are independent then subsequent products are 
independent as well. 

Theorem 4.1 (Chernoff bound). Let X\, X2, ■ ■ ■ , X n be independent Bernoulli tri- 
als that take value 1 with probability pi. Let X — Y)j—i Xi, fi = E[AT], and 5 be any 
real number in the range (0, 1] . Then 

Pr[AT < (1 - S)fi] < cxp(-/i(5 2 /2) . 

A classical result is that collision probability is minimized when the distribution 
is uniform. For a proof see [TBI P a g e 66] . 

Lemma 4.2. Let X be a random variable on a set S. Then 

£Pr[x = «]°>£(^ 2 

a£S aeS Vl ' 



Theorems |1.7| and |1.8| assume that the given distribution is symmetric. Our 
definition is a simple generalization of that in [5] . 

Definition 4.3. The distribution of a random variable X on G is symmetric if 
Pr[X = a] = Py[X = a^ 1 ] for all a € G. In this case we call X a symmetric random 
variable. 



8 ALFORD, GRANTHAM, HAYMAN, AND SHALLUE 

Recall that all groups under consideration are abelian. If if is a subgroup of G 
and X is a random variable on G then there is a natural random variable on G/H 
given by sampling X and then mapping the result to G/H via the map g H > gH . 
Call this random variable Xjj . A specific example occurs when G = (Z/AZ) X , we 
sample X and want the result modulo Qi. Call this random variable X mod Qi. 
Mapping to G/H preserves the symmetric property. 

Proposition 4.4. Let H be a subgroup of G. If X is symmetric then Xh is 
symmetric. 

Proof. Let 4> : G — > G/H be the canonical homomorphism. Then 

Pr[X H =aH}= ^ Pr[A = a] = ^ Fr i X = °] = J2 Pr ^ X = a ^ 

aEaH a- 1 Ea- 1 H aEa,- 1 !! 

since <f> a homomorphism implies that a £ aH if and only if a -1 G a~ 1 H . The fact 
that X is symmetric then yields 

^ Pt[X = a- 1 } = V Pr[A = a] = Pr[A H = a^H] . 



a&a,~ 1 H a^a-^H 



□ 



Constructing a new group via direct product also preserves the symmetric prop- 
erty for random variables. If X\, X 2 are independent random variables on Hi, H 2 
respectively, then let X\ x X 2 be a random variable on G — H\ x H 2 . We define 
this random variable by 

Pr[Xi x X 2 = (a, b)} = Pr[Xi = a] ■ Pr[A 2 = b] . 

Proposition 4.5. If Xi, X 2 are symmetric random variables on Hi,H 2 respec- 
tively then Xi x X 2 is symmetric on G = Hi x H 2 . 

Proof. Follows immediately from the fact that (a,6) _1 = (a _1 ,& _1 ). 

a 

Products of random variables also preserve the symmetric property. 

Proposition 4.6. Let X\, X 2l . . . , X n be symmetric random variables on G. Then 
Y = Yii=i x i is also a symmetric random variable on G. 

Proof. Using the symmetric nature of each of the Xi we have the following identity 
involving multiple sums. 

Pr[F = b] 
= J2 Pr[^i=a 1 ]---Pr[X„_ 1 = a n _ 1 ]Pr[X n = 6-(a 1 o 2 ---a n _ 1 )- 1 ] 

= Y, Pr[Ai=ar 1 ]---Pr[^n-i-a,Ti 1 ]Pr[X n = 6- 1 -( aia2 ---a„_ 1 )] 

ai,...a n -i£G 

= Pr[F = b- 1 } . 

a 

Phase 2 of Algorithm [2] involves matching elements whose product is in some 
subgroup H of G. Weakening the definition of collision to having X\ ■ X 2 in a 
subgroup H yields a similar result to Lemma |4.2| symmetric random variables 
have a greater than uniform collision probability. 
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Proposition 4.7. Let H be a subgroup of G and let X\, A 2 be independent random 
variables on G with identical symmetric distributions. Then the probability that 
X± ■ X2 is in H is at least \H\/\G\. 

Proof. Let C be a set of coset representatives of H . By group theory, C has size 



|G|/|i/|. Xi mod H is symmetric by Proposition 4.4 and thus Pr[Xi 6 aH] 
Pr[A 2 e a- x H] for all a in C. Then 



Pr[Xi • X 2 e H] = 2_^ Pt[Xi = a] Pr[A 2 € a _1 #] 
= Y^ Pr [^i € off] Pr[A 2 e a _1 #] 



aGC 

= ^ Pr[Xi G ail] 2 

aGC 

where the lower bound follows from Lemma 14.21 □ 

5. Divisors of A 

L6h and Niebuhr [Sj note the distribution of divisors of A modulo qi , but provide 
no proof. Here we give a full description of the distribution modulo Qi = q^ 
in order to provide justification for the claim that elements of V are distributed 
symmetrically modulo A. 

We begin with the following lemma, then extend the distribution to all classes 
modulo q i * . 

Lemma 5.1. Suppose that a divisor d of A is chosen uniformly at random from 
the set of all divisors of A. Then 

Pr[d exactly divisible by qf] = - 

for all 1 < i < r and all 1 < e < hi. 

Proof. The divisors of A can be identified with r-tuples (ei, . . . ,e r ), where e^ is 
power of qi that divides d. It is thus the case that when the divisors are partitioned 
by the power of qi, there are hi + 1 such partitions and they are all of the same 
size. □ 

If we now assume that the part of a divisor relatively prime to qi is uniformly 
distributed modulo Qi, then the l/(hi+l) probability of all divisors exactly divisible 
by qf is shared equally among the gf i_e — gf i_e_1 elements of Z/Q t Z which are 
exactly divisible by qf. This yields the heuristic distribution 

f ^ if a = 

Pr[d = a mod gf*l = < \ ., e ,, ,. ., 

,, ,.,,, hi - e — hi-c-i. if qf exactly divides a 

It is easy to show that if d + 1, a" + 1 are multiplicative inverses in (Z/QjZ) x 
and of exactly divides d then qf exactly divides d' + 1. Thus the distribution of 
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X = d-\- 1, d | A is symmetric on (Z/QiZ) x , and Proposition 4.5 extends this result 
to(Z/AZ) x . 

A difficult question is whether the distribution remains symmetric with the added 
condition that d + 1 be prime. We will assume it does, but it is worth noting that 
elements of V do not have the same distribution as divisors of A. For example, if 
d = kqi — 1 mod q i * for some k then d + 1 is divisible by qi and thus not prime 
(note this particular problem does not jeopardize the claim that V is distributed 
symmetrically over (Z/AZ) X ). 

Analysis of Algorithm [2] will depend upon the following heuristic assumptions, 
which we will henceforth call the standard assumptions. 

Heuristic 1 (Standard assumptions). Let X p be a random variable corresponding 
to the value modulo A of p 6 V . 

(1) The X p are symmetric random variables on (Z/AZ) X . 

(2) The X p are independent, as are X p mod Qi for different 1 < i < r. 

(3) The probability that X p = 1 mod Qi is at least l/(hi + 1). 

(4) A is constructed so that 1 < hi < r/i for all 1 < i < r. 

(5) The size of V is K(A). 

6. Algorithm analysis 



The following theorem provides the proof for Theorem 1.7 and is general enough 



to be applicable to many settings besides constructing Carmichael numbers. We 
use the notation £ for ^/log 2 |G|, and follow closely the proof of Theorem 3.2 from 
0- 

Theorem 6.1. Let G = Go be an abelian group with subgroups G\, G2, ■ ■ -,G]> 
where |Gi|/|G i+ i| = 2 e for < i < £ — 1. Suppose that S contains at least 0(£ 2 4 e ) 
independent elements of G distributed symmetrically. Then Phase 2 of Algorithm^ 
finds a solution to the subset product problem with probability at least 1 — e~ f2 ( 1 °sl c »I) 
using time and space 0(4v og 2 l G l). 

Proof. We begin by assuming the algorithm has been successful up to level u, so 
that we have a list L u of elements in the group G u . Our goal is to prove by induction 
that 

\L u \>C u -£ 2 2 2e - u 

with high probability, given that \L \ = Co£ 2 2 2i . Here C u is defined recursively by 
Co = 3,C„ = C u _i - 2-( e - u \ For < u < £ we have 1 < C u < 3. As long as 
\Li\ > 1, then Phase 2 succeeds in finding a solution. 

Given a G L u , let X\, be a Bernoulli random variable that takes value 1 if a and b 
"match," that is if a-b € G u +\ . Then a has a match in L u as long as X = J^b A^b > 1- 
Elements of G u are products of symmetrically distributed elements of S, and thus 
are symmetrically distributed themselves by Proposition |4.6| It then follows from 



Proposition 4.7 that Xb = 1 with probability at least |G u |/|G u +i| = 2 . Thus 
E[X] > £ 2 until all but £ 2 2 l elements are matched. If this holds then 

|l u+1 | > \h^pL > ^-„_i . ^ _ 2 - { t-u) 

The products are distinct and hence independent, so the Chernoff bound applies. 
The probability that a fails to match is at most 

Pr[AT < £ 2 /2] < Pr[AT < (1 - 1/2)E[X]] < exp(-E[X]/8) < cxp(-£ 2 /8) . 
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We seek to make at most £ 2 i matches, so all will succeed with probability at least 

(1 - e ^ 2 / 8 f 4 ' > 1 - tH e e~" 2 / 8 . 
The probability of all matches succeeding at all £ levels is then at least 



(1 - e-^ +u f > 1 - £e-^ +3i > 1 - e-^ 2 ) = 1 - e-^slGD 



7. Bounding the Running Time 



□ 



The previous section gave a subexponential analysis of Phase 2 running on a 
general group G. Here we bound log \G\ in terms of N so that the running time of 
Algorithm [2] can be expressed in terms of the problem size, proving Theorem 1.8 



In fact, an easy bound on log A would be sufficient asymptotically, but to measure 
the gain from having G be a subgroup of (Z/AZ) X we go further and prove that 
m = 0(-\/r log r). Throughout we will take the standard assumptions as given. 
Our starting point is item (5) of the standard assumptions, namely that 

- A nf*. 



<f>(A)lnV2AfJi\ ft-1 

and which we denote by N. Recall that G is defined as the integers modulo A = 
Y\T=i Qi i wnere m is the smallest integer such that N m is large enough. Here N m 
is the number of elements of V with u>(ja) < to and "large enough" means E[iV m ] 
is large enough for Phase 2 to succeed with high probability. By our standard 
assumptions the probability that an element of V is congruent to 1 modulo q i i is 
at least h 1 +1 and this condition is independent for different values of i. Hence the 
condition on E[AL,1 becomes 



hi log 



^(A)ln^fjv 1 Ui-lJJX tH + 1 

Clearly 1 < m < r and a smaller m is preferred so we will provide an upper bound. 
The analysis requires m > 10 and r > 64. 

Bounding log \G\ requires several preparatory results. First we prove that log A 
is polynomial in r. 

Lemma 7.1. Assume that 10 < m < r and that 1 < hi < r/i for 1 < i < r. Then 

m 

m < 2_. ^ log qi < 2r(log m) 2 . 

8 = 1 

Proof. The qi are the first r primes and 1 < hi for all i. Thus X)j=i ^ ^°SQi > m - 
For the upper bound we require a bound on the mth prime number. From 
Section 8.8] this is given by q m < m(lnm + In mm) for m > 6. With m > 6 we use 
the conceptually easier bound of log q m < 2 log m. Since hi < r/i this yields 

m m _. 

2_] hi log qi < 2r log m 2_\ - < 2r log m ■ (1 + In m) 

4=1 i=\ 

and (1 + In ?7i ) < log to for to > 10. □ 
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An immediate corollary is that log A < 2r(logr) 2 . Next we find that r is loga- 
rithmic in N. 



Lemma 7.2. Assume that r > 64. Then log N > r/3. 

no Torm n ■ — L 

g<-i 



Proof. We start with the definition of A given above. The term ftj + ^zr is bounded 



below by 3/2 for all & > 3. Since A/0(A) > 1 and In \/2A < log A < 2r(logr) 2 by 
Lemma |7. II we have 

_ A py / gt -2 \ 1_ /3 N 

~ 0(A)lnv^A fj V * + ft-lj > 2r(logr) 2 ^2 

Thus log A > (r - 1) log (3/2) - (1 + log r + 2 log log r) > r/3 when r > 64. □ 

When bounding E[A m ] a sticky term is Yl(hi + l)/(^i + jjrEf )• ^ ought to be 
close to one; we provide a bound logarithmic in r. 

Lemma 7.3. Assume r > 16. Then for all m > 1 

n j^k < ^ 2 • 

j=m+l 3 qj — l 
Proof. Start with the transformation 

hj+l _ h ^ + & + 1 -& =1 l/fe-l) =1 , 1 



fc, + gE? ^ + f^f % + l^f fe-i^ + %-2- 

By assumption qj > 3 and hj > 1, so (qj — l)/ij > 2 for all 2 < j < r, giving the 
bound 

1 + X <1 + I. 

{ qj - l)hj + qj - 2 q 3 

We saw in Lemma 7.1 that r > 6 implies q r < 2rlnr. We now use a result from [3J 
Section 8.8] on the prime reciprocal sum, namely 

V-< y - <lnln(2rlnr) + 5 + - — . 

^ a, ^ o (ln(2rlnr)) 2 

3 = 1 y J g<2rlnr y \ \ JJ 

where the sums are over primes and B < 0.27 is the prime-reciprocal constant. 
This upper bound is less than 2 In In r as long as r > 16. The slope of the tangent 
line to y = e x at x — is 1, which means e x > 1 + x for all positive x. With 
x = 1/qj this gives 

rj ( H J < exp y^ — 1 < exp(21nlnr) = (lnr) 2 . 



j—m-\-l 



a 



With this preparatory work out of the way the bound on log |G| follows from 
properly bounding E[A m ], the expected number of elements of V with ui- value m. 

Lemma 7.4. Assume r and N are sufficiently large. Given the standard assump- 
tions, 

log|G| < 271ogA(loglogA) 2 . 
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Proof. Recall that G is defined as the group of units modulo Yii=i 1i ' where m is 

the smallest integer such that ATLLm+i h~+i > ( lo S A)W loglGl . Since m is the 
smallest such integer, multiplying by l/(h m + 1) flips the inequality. Thus 

1 « m / r,\ r h ■ -L *?'~ 2 

Focusing on the left hand si de, h m < r by construction, A/</>(A) > 1, ln\/2A < 



log A < 2r(logr) 2 by Lemma 7.1 and ]J(hi + 2i 3y)(/i, + 1) > (lnr) 2 by Lemma 



7.3 Combining all these bounds yields 



1 1 1 TT /, , <?i - 2 



r + 1 2r(logr) 2 (b 


ir) 2 


m / 




4) 



n(^+|^j<(iogA)4vra 



- (2 log 2r + 4 log log r) < log log A + 2 



\ 



Y^ hi log ( 



i log (3/2) - 3 log r < log log A + 2 v /2r(logm) 2 



7.1 



where J^Li ^« l°g9i < 2r(log?n) 2 follows from Lemma 

Ifm > 6-v/r logr then mlog (3/2) > 2\/2r log m+log log A+3 log r for sufficiently 
large r so we must have m < 6^/r\ogr. 

Since log |G| < 2r(logm) 2 this bound on m gives us log \G\ < 2r(log6+ \ log?' + 



log logr) 2 which is at most 2r(|logr) 2 if r > 32. Lemma 7.2 now completes the 
proof. 

D 

Although asymptotically log \G\ and log A are equivalent at 0(logiV(loglogiV) 2 ), 
this work showing m — 0(^/r\ogr) provides a theoretical justification for the gains 
seen in practice. We now prove Theorem |1.8[ restated here for convenience. It 



was proven that m < 6-^/rlogr in Lemma 7.4 so the necessary assumption that 
m > 2 ^fr logr causes no harm. 

Theorem 7.5. Let G = (Z/AZ) X and V he defined as in the Erdos construction, 
with the added assumption that 1 < hi < r/i for all 1 < i < r. Assume that 
the elements of V are independent and distributed symmetrically in G, and that 
N = \V\ = K{K). Finally, assume that the probability p = 1 mod Qi for p G V is 
at least l/(hi + 1) and independent across Qi. 

Then there is an algorithm that with probability at least 1 — e ( log ) finds a 
subset of ' V that products to b in G and requires time and space 

2 0( v / (logiV)(logIogAf) 2 ) 

Proof. First assume a solution is found and that it includes ao, the distinguished 
element of G. Then b^ 1 mod A times some product of primes is the identity in G, 
and since each a^ is the identity in (Z/AZ) X /G, we have discovered a set of primes 
in V that product to b modulo A. 

In bounding the probability of failure we focus first on the size of V m and the 
distinguished element ao- We notated \P m \ by N m , and chose G so that 

E[N m ] > (logA)4\ /i °slG| . 
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For p £ V let X p be a Bcrnouilli random variable that takes value 1 if t-o(p) = m, 
and let X be the sum of all X p . Then E[X] = E[7V m ] and by the Chernoff bound 

Pt[X < (1 - l/2)E[iV m ]] < exp(-E[JV m ]/8) 

logA x 



Pr 



X < i(logA)4V lo sl G l 



< exp 



The distinguished element is dealt with in a different fashion. For levels u = r to 
m + 1 wc multiply 6 by an element p such that w(p) = u and p = 6 mod g^™ . Focus 
on level it and let Y p be a Bernoulli random variable that is 1 if it satisfies that 
condition, with Y being the sum over all Y p . The worst case is at level m + 1, where 

PrK = " - ^a- -V-,v- n , *tt s t^ 7 n , xttt 

%+l ^777 + 1 7=777+2 ^777 + 1 7 — 777 + 2 



and hence 



TV Jt 1 1 4V /l °sl G l 

E P1^-S^r II 7T- T>^+T- E [^]>^7-+T-- lo g A 

ffni+l 7=777 + 2 "« + X 9,77+1 C+l 



We will show (/i m+1 log 9m+ i) 2 < 4^™ i hi lo g* and hence that 9,n+T < 4^ log|G| . 
Since the hi are non-increasing, 4^,- =1 hilogqi > 4m/i m +i. Meanwhile, 

/i m+ i(logg m+ i) 2 < — — - • (2 log (m + l)) 2 < 4m 

777, + 1 

since m > 2y / r logr implies m 2 + m > 4r(logr) 2 . 

Now the Chernoff bound can again be applied, giving 

Pr[F < (1 - 1/2) (log A)] < exp (-log A/8) . 

This is the worst case among the at most r levels, which means the total success 
probability of Phase 1 is at least 

(1 - e _n(lQgA) )(l - e - n(logA) ) r > 1 - (r + i^-^sA) > i - (3 log N + i) e -"( lo s A ) 

where r < 3 log N follows by Lemma |7.2| 



With 5 containing 0((log A)4V /l °sl G l) elements, Theorem 



G.l 



allows us to con- 



clude that Phase 2 completes with probability at least 1 — e 42 ( lo e A ), Note that 



the distinguished element is distributed symmetrically by Proposition 4.6 and that 
Theorem |6 . 1 1 assumes it is always the first to be matched. The total probability of 
success is thus also 1 — e~°( logA ). 

Finally we note the running time. We start the algorithm with at most 
0(r(log A)4v lo sl G l) elements in S. Finding the right product to put b~ x in G 
takes r searches at a cost of 0(yJlog \G\) each. Phase 2 takes time and space 



0((log A)4v lo sl G l). Applying Lemma 7.4 then gives a total resource usage of 



2 O( v /(logiV)(loglogA0^ n 

8. Construction 

Algorithm [T] was implemented in Python and run on a T3E. The best result 
so far is that Carmichael numbers have been constructed with k prime factors for 
every k between 3 and 19565220. 

It is important to detail how one can efficiently "push down" several primes in 
one step. Let a be an element of the set S at stage j. We seek a that maximizes 
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Table 1. Large Carmichael numbers 

A = 2 15 -3 8 -5 5 -7 4 - ll 3 -13 2 -17 2 -19 2 -23 2 -29-31-37-41-43-47-53- 59-61-67- 71-73-79 

= 288828494392627542423975683172283292832395366400000 

k = 1021449117, \V\ = 1021449926, K(A) = 1009441849 

n = ... 428427434862714073557504000001, \T\ = 809, d = 25564327391 

Subset Product Time = 174 sec 

A = 2 16 -3 7 -5 5 -7 4 -ll 3 -13 2 -17 2 -19 2 -23 2 -29 2 -31-37-41-43-47-53-59-61-67-71-73- 79-83-89-97 

= 4001166357176246301338040166195304168348080373267865600000 

k = 10333229505, \V\ = 10333230324, K{K) = 10225023621 

n = ... 7953631560231294017777459200001, \T\ = 819, d = 295486761801 

Subset Product Time = 98 sec 



u>(a ■ a). For a set A let A = {a -1 mod A : a E A}. Then combine the sets 
S' = {a : a <G S and a < a^ 1 mod A} and S\S' and sort lexicographically on the 
list of residues. Then for our element a, the element of S \ S' closest to a in the w 
metric will be the element immediately following a or preceding a. We use S' rather 
than combining S and S because in the latter case we end up with duplicates. Once 
you find the associated a, if the maximum possible value of w(a • a) at the jth stage 
is j, proceed to the next element. If it is greater than j, then remove both a and b 
from 5, multiply and set aside. 

Algorithm [2] was implemented using C++ and NTL [15] and run on a 3.3 GHz 
processor with 16 GB of main memory. Two Carmichael numbers are presented in 
Table [ll Here, k is the number of prime factors of n and d denotes the number of 
decimal digits of n, while T is the set output by Algorithm [2] and removed. As con- 
jectured in [5], K(A) is within 3% of \V\. The last thirty digits of each Carmichael 
number are also included. The billion prime case took more time because the 
method of calculating G resulted in Phase 2 operating on 27.7 million primes, as 
opposed to 16.5 million for the ten billion factor case. This neatly demonstrates 
how the number of primes needed for Phase 2 grows more slowly than N . 

For Algorithm [2] the most important implementation detail was the instantiation 
of elements of V. Since such primes have the property that p—1 divides A, one can 
store only the exponents of the divisors of A, thereby fitting primes into a single 
64-bit long for all cases under consideration. 

However, one cannot multiply elements of V and maintain the property that one 
less than the element divides A. We created a class ModElement which encapsulates 
an integer modulo A, a vector of condensed elements of V that product to the integer 
(called the "history"), and methods that product such elements or compute its 00 
value. In this way, when some element is the identity, the history of that element 
gives the solution to the subset product problem. 

We implemented the subgroups Gi as integers modulo Mi where Mi is an appro- 
priate divisor of A. Ease of implementation made this an attractive alternative to 
a generic group model, but one disadvantage is that Mi is sometimes larger than 
the order of the subgroup it represents. For example, if 5 2 divides A and 5 divides 
Mi, then M i+1 will be divisible by 5 2 in order that products at the next level be 
congruent to 1 mod 25, and hence be five times too big. 
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A significant improvement to Algorithm [2] comes from passing elements congru- 
ent to 1 modulo Mi to the next level without matching. Since elements congruent 
to 1 modulo qf are more common, observed sizes of Li are larger than L,_i/2. 

As for the underlying data structure, the necessary requirements are that one 
have constant time insertion and removals, and that one can quickly find elements 
with a particular value modulo Mj. Our solution was a hash table of ModElements 
keyed to the residues modulo Mj of the value. 

Even though this data structure makes each prime expensive to store, the total 
memory needed is quite manageable due to the relatively small number of primes 
needed out of the total. For example, in the ten billion case we use only about 16.5 
million primes, since G = (Z/AZ) X where A = 2 16 -3 7 -5 5 -7 4 -ll 3 -13 2 -17 2 -19 2 -23 2 - 
29 2 -31-37-41-43-47-53-59. 

9. Future Work 

In [6] the authors extend the basic Erdos construction to a variety of other 
pseudoprimes. Most likely the methods in this paper can be extended as well. 
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